Statistical Machine Transliteration with Multi-to-Multi Joint Source Channel Model

نویسندگان

  • Yu Chen
  • Rui Wang
  • Yi Zhang
چکیده

This paper describes DFKI’s participation in the NEWS2011 shared task on machine transliteration. Our primary system participated in the evaluation for English-Chinese and Chinese-English language pairs. We extended the joint sourcechannel model on the transliteration task into a multi-to-multi joint source-channel model, which allows alignments between substrings of arbitrary lengths in both source and target strings. When the model is integrated into a modified phrasebased statistical machine translation system, around 20% of improvement is observed. The primary system achieved 0.320 on English-Chinese and 0.133 on Chinese-English in terms of top-1 accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation

This paper proposes a hybrid word alignment model for Phrase-Based Statistical Machine translation (PB-SMT). The proposed hybrid alignment model provides most informative alignment links which are offered by both unsupervised and semi-supervised word alignment models. Two unsupervised word alignment models (GIZA++ and Berkeley aligner) and a rule based aligner are combined together. The rule ba...

متن کامل

A Modified Joint Source-Channel Model for Transliteration

Most machine transliteration systems transliterate out of vocabulary (OOV) words through intermediate phonemic mapping. A framework has been presented that allows direct orthographical mapping between two languages that are of different origins employing different alphabet sets. A modified joint source–channel model along with a number of alternatives have been proposed. Aligned transliteration...

متن کامل

Combining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration

This paper describes our system for “NEWS 2009 Machine Transliteration Shared Task” (NEWS 2009). We only participated in the standard run, which is a direct orthographical mapping (DOP) between two languages without using any intermediate phonemic mapping. We propose a new two-step conditional random field (CRF) model for DOP machine transliteration, in which the first CRF segments a source wor...

متن کامل

A Joint Source-Channel Model for Machine Transliteration

Most foreign names are transliterated into Chinese, Japanese or Korean with approximate phonetic equivalents. The transliteration is usually achieved through intermediate phonemic mapping. This paper presents a new framework that allows direct orthographical mapping (DOM) between two different languages, through a joint source-channel model, also called n-gram transliteration model (TM). With t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011